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ABSTRACT 

This paper makes the case that mandated tests, such as those 
administered through a state’s accountability system, can best meet the goal 
of curricular reform by making their domains of learning targets transparent 
to users. Test maps are proposed as an effective device to accomplish that 
goal. A test map describes the content of the test and how it is sampled to 
produce each form. To illustrate the use of test maps, modified to be used as 
a unifying device to express an achievement domain in unambiguous terms, the 
paper draws on the example of Maryland' s assessment and accountability 
program. Following suggestions to develop test maps would have little impact 
on how tests are developed and scored, but it is suggested that they would 
have a large impact on how they are described and used, and ultimately on 
their effectiveness as agents of curriculum reform. The paper also makes some 
suggestions about assessment program design and product development in order 
to create an information-rich classroom environment that capitalizes on the 
new domain descriptions. An appendix presents information from the Maryland 
Web site about assessment limits in Biology. (SLD) 
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A fundamental assumption of assessment and accountability programs is that teachers 
will attempt to adjust the content of their curricula to mirror the tests their students will 
take. In order to effect positive instructional change, these tests are supposed to conform, 
at least in part, to the desired curriculum’s learning targets. It is argued here that 
mandated tests, such as those administered through state accountability systems, can best 
meet the goal of curricular reform by making their domains of learning targets 
transparent to users. Test maps are proposed as an effective device to accomplish that 
goal. Follow ing these suggestions would have little impact on how tests are developed or 
scored, but we believe they would have a large impact on how they are described and 
used, and ultimately on their effectiveness as agents of curriculum reform. Finally, we 
describe some suggestions about assessment program design and product development in 
order to create an information-rich classroom environment that capitalizes on the new 
domain descriptions. 



Need for Domain Description 

We argue here that if assessments are to direct reform, the achievement targets that 
constitute the domain of each of these tests must (a) be a legitimate domain of 
achievement targets (by this, we mean that agreement has been reached using an accepted 
process), (b) be sufficiently described to be communicated effectively to others, 
especiaUy instructional personnel, and (c) be reliably sampled by the test (i.e., not only 
does the test sample the domain well, but also, teachers believe it will sample the domain 
well). Each of these is discussed briefly. 

(a) Be legitimate. Like most people, educators work more effectively if they 
believe their goals are worthy. In education, that means the value of the targets of 
instruction is apparent. While each of us can and often do make judgments 
according to our own beliefs about any set of curricular goals (i.e., learning 
targets), harnessing the efforts of schools, districts, and an entire state requires a 
shared belief in the worth of the goal. In our democratic society, that requires a 
process, usually political, in order that a consensus may be attained. For example, 
in developing goals in some content area, a process that effectively includes 
representation of teachers will be better accepted by educators than one which 
does not. The state school board, representing a broad constituency of 
stakeholders, would be an accepted authority to approve both the process and the 
product. 
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(b) . Be effectively described. A test is supposed to assess what students are to 
know and are able to do with what they know. Actually, though, every 
achievement test item prompts both these elements because it requires a student to 
do something with something. Classroom assessment textbooks typically attempt 
to operationalize this point in how they recommend domain descriptions of tests 
be done. Normally, a table of specifications is the device used to describe each 
item in terms of its content and process dimensions. That is, what the student 
must know and what he or she is to do with that knowledge is described by 
combinations of content (e.g., rows) and process (e.g., columns) in a table of 
specifications. 

But it is argued here that a table of specifications is inadequate to communicate 
the domain of a test to those in the field who need to understand it in terms of the 
instructional targets it represents. Both dimensions of the table are too imprecise. 

The content dimension typically is not sufficiently detailed to determine the 
extent of that which students must know. The ambiguity is tolerated in order to 
make the table less cumbersome. Elaborations of the content elements are 
necessary in order that the assessment specialists who write tests and the 
educational specialists who use them agree on the scope of the knowledge 
elements. 

Similarly, the process dimension requires clarification. It is generally 
acknowledged as inadequate among assessment professionals that students be 
asked only to recall content knowledge, but specifying the higher-order reasoning 
that is to be included in an instructional domain is not straightforward. Likely, 
different educators would disagree on even the way higher-order thinking should 
be described. For example, Nitko (2001) describes four approaches in his 
introductory classroom assessment text. Nevertheless, a complete domain 
description should indicate not only what students are to know, but also what they 
are expected to be able to do. Otherwise, educators (especially curriculum 
developers and teachers) and assessors will not be working toward the same 
domain. 

Even a highly motivated educator cannot attain a goal that is unclear. Some way 
is needed to clarify the domain of each test so it can communicate unambiguous 
targets in combinations of both content and process dimensions. We will suggest 
below a way to clarify both dimensions. 

(c) Be reliably sampled. Everyone agrees that the domain of any assessment 
should be sampled representatively on each test form. However, teachers who 
have worked with mandated assessments often do not feel the test covers what 
they have been teaching, even when they have honestly represented their district’s 
curriculum instructionally. Perhaps they are often right. The connection between 
the tested domain and the educators’ learning targets needs to be established at the 
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start of the appropriate instructional sequence. Not only must the educator 
understand the domain, but he or she must also believe the test will sample it 
appropriately. Otherwise, the test wiU be marginalized as irrelevant and any 
motivation expected as a result of the assessment and accountability program will 
have been lost. 

In summary, a testing program is effective as a guide to instructional goals to the extent 
that it covers a publicly accepted learning domain that is described in terms of both 
content and cognition. 



Test Maps 

A test map, which is usually more specific than a table of specifications, describes the 
content of the test and how it is sampled to produce each form. Examples of test maps 
may be found at the web page http://mdkl2.org/mspp/high_school/look_like/index.html 
where there are several sample tests that follow test maps for a high school assessment 
program. We propose a modification so that a test map may be used as a unifying device 
to express a legitimate achievement domain in unambiguous terms and to ensure not only 
that any form of the test will sample that domain appropriately, but that educators will 
anticipate appropriate coverage (these conform to our three principles, above). Because 
we know it best, we use Maryland’s assessment and accountability program as our 
illustration. We point out where Maryland’s program illustrates our recommendations, 
but the majority of our suggestions have not to our knowledge been implemented 
anywhere and would apply just as well to Maryland as they would to any other such 
program. 

Example of Current Practice 

We use an example fi’om the Maryland’s State Content Standards to illustrate some of the 
aspects of our proposal. The full State Content Standards may be found at 
http://mdkl2.org/mspp/standards/index.html and fi-om there links may be followed to any 
of the cited material that follows. 

The State Content Standards describe the expected domain of education for students in 
the state in four content areas: Language Arts, Mathematics, Science, and Social Studies. 
Our example will be taken fi’om Mathematics. 

Within each content area, there are several outcomes. We will use the outcome level as 
the degree of specificity of content in our example. In Mathematics, there are ten 
outcomes: (1) Knowledge of Algebra, Patterns, and Functions, (2) Knowledge of 
Geometry, (3) Knowledge of Measurement, (4) Knowledge of Statistics, (5) Knowledge 
of Probability, (6) Knowledge of Number Relationships and Computation, (7) Process of 
Problem Solving, (8) Process of Communication, (9) Process of Reasoning, (10) Process 
of Connections. We wiU use Knowledge of Probability in our example. 
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Within each outcome there are several indicators. The indicator is the lowest level of 
specificity in the statement of the domain that is to be represented by Maryland’s tests. 

In probability at the eighth-grade level, there are five indicators that build upon the three 
indicators at the fifth-grade level, which in turn build upon the three indicators at the 
third-grade level. This sort of representation of an instructional domain is likely not 
unusual. Assessments are commonly constructed using the indicators, as has been done 
in Maryland; see the link to the test maps, above. 

Among the eighth-grade extensions is the indicator “find the probability of simple 
dependent and independent events using various methods including constructing a sample 
space.” It is likely that a statement like this is fairly typical, at best, of the level of detail 
in most state descriptions of learning targets. But it is our contention that a statement like 
this is inadequate to describe for a teacher what needs to be covered during instruction. 

Should you disagree and feel the statement is adequate, let’s say you are asked to teach 
students to “find the probability of simple dependent and independent events.” What do 
you include? Do you express probability as a ratio of equally likely events, as the limit of 
repeated samples, as a degree of belief, or as some combination of these? Do you define 
simple and compound events? Do you ejqjress dependence as limiting the sample space 
to a subpopulation or do you define unions, intersections, negations and dependence Vs. 
independence and then use computational formulas for probabilities? Do you teach Venn 
diagrams? Do you teach your students to use two-way arrays for computing probabilities 
of conditional events? These questions are important; they speak directly to what students 
will be asked to do in the classroom and on the test. The answers to these questions are 
crucial for alignment to exist between instruction and assessment. But how should you as 
a teacher answer questions such as these when the state’s description of the domain is 
silent on them? 

One approach is to guess. Indeed, what else can you do? But there is no guarantee that 
the test will cover the domain the way that you decided to teach it. Of course, you should 
look at your district’s curriculum materials. But even if they answer these questions, 
there is no guarantee that they cover the same domain as the test will since educators who 
also were guessing about these and other questions like these wrote them. All they do for 
consistency is help you make some of the same guesses as do the other teachers in the 
district, which will almost certainly vary district-to-district. 

Parenthetically, it is sometimes mentioned that some ambiguity is helpful, since it 
encourages teachers to teach a broader array of material. Certainly, some may do that. 
Others may decide to concentrate on one, but not all ways to cover an indicator. There 
may be other solutions to get fi"om an ambiguous indicator to an individual teacher’s 
instructional goals. But that seems a rather haphazard approach to curriculum. We can 
easily imagine that some students will not have had an opportunity to learn material that 
will be on the state test due to a misunderstood content domain. Others may have 
stretched their learning beyond the scope of the test’s content domain, so that the test 
under-samples their achievements. While allowing a district, a school, or a teacher to 
enhance the scope of its curriculum beyond the objectives of the state is to be 
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encouraged, to do so through planned ambiguity seems a poor policy. To the ejctent that 
the domain of the test is ambiguous, students’ opportunity to learn the tested domain 
becomes haphazard. 
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Assessment Limits 

Since the state is the authority that is entrusted with accountability testing, it is the state’s 
responsibility to ensure that teachers understand the scope of the indicators that are to be 
taught and that the domain of the test agrees exactly with the extent of these limits. 
Without these two requirements, we cannot make inferences about the causes of low test 
scores, nor will we be able to do much to improve them. Just as learning targets need to 
be clarified for students and then assessed as they have been clarified, so also is it 
necessary for teachers to understand their instructional targets in order to hit them. 

As Baker (2002) recently noted, defining “the operational limits of the target domain of 
learning” is a necessary condition for using assessments effectively for both 
accountability and for school in 5 )rovement. As we noted above, too much ambiguity 
clearly exists in the original statement of our example indicator. We could make the 
same points about most other indicators, as well. 

Let us explore how indicators may be made more explicit. We will introduce and them 
elaborate upon the concept of Assessment Limits (a term borrowed from Maryland’s 
assessment programs; see the Appendix for Maryland’s statement about assessment limits 
as they are used in its biology testing program). Assessment limits as presently used in 
the Maryland high school assessments specify the exact content that may appear on the 
test. When developed properly, they define what is and is not “fair game” for the 
assessment. As implemented by Maryland, the Assessment Limits represent statev^de 
consensuses that were developed with broad teacher representation. The Assessment 
Limits are widely disseminated and are used in item and test development. 

Here is an example of Assessment Limits for our example indicator developed by 
explicitly enumerating its con 5 )onents.. A few of the items actually apply also to other 
indicators in the eighth-grade mathematics outcomes, but are included anyway so they 
appear more internally coordinated. 

1 . A universe is the entire collection of outcomes that may occur. 

2. An event is an occurrence or outcome that satisfies a condition. 

3. Mutually exclusive events may not occur together in one outcome. 

4. The condition that defines an event may be simple (based on one characteristic) or 
compound (based on two or more characteristics). 

5. Compound events (or conditions) are derived from simple events by parentheses 
and the operators union ( U ), intersection ( H), and negation ( ~ ). The 
assessment is limited to at most three events, two unions, two intersections and 
one negation in any compound event. 
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6. Probability of an event, P(A), is the frequency of occurrences that satisfy the 
event (A) over the total frequency of occurrences. This definition requires an 
assumption that all occurrences are equally likely. 

7. Probability of an event is also the lim it of the ratio of number of times the event 
occurred over the number of trials as number of trials increases. This definition 
requires an assumption that the trials are independent. 

8. Express a universe of two sets of mutually exclusive events as a two-way table of 
frequencies or probabilities. Annotate the table to show derivable confound 
events in the cells and in the margins. 

9. Express a universe of three sets of mutually exclusive pairs of events as a Venn 
diagram. Annotate the diagram to show derivable compound events for all 
regions. 

10. The relation “given” (|) limits the universe to outcomes satisfying the event 
following the relation. 

11. Two events are independent if the probability of one is unaffected by the 
(non)occurrence of the other. That is A and B are independent if P(A|B) = P(A). 

12. P(A nB) = P(A) • P(B|A). 

13. P(A U B) = P(A) + P(B) - P(A HB). 

14. P(A|B) = P(A HB) / P(B). 

While these limits are painstaking to enumerate, they communicate an unambiguous sub- 
domain to a teacher (and to an item writer). Of course, these are just our example. They 
do not represent the position of Maryland on the meaning of that indicator. But the 
mathematics education community could easily reach consensus on a similar list for this 
and all other indicators. These would then become at once an “at least list” for teachers 
and an “at most list” for test developers who are writing items for that indicator 
representing the core knowledge embodied in the indicator. The list defines the content 
of test items that may be used to represent the content of the indicator. The process also 
must ensure that no element on the list has a zero probability of appearing on the overall 
assessment. Otherwise, the assessment limits would not represent the realized 
assessments. 

Note also that this sample of assessment limits defines a sub-domain that is 
instructionally meaningful to assess. A sub-score would carry implications for individual 
student remediation as well as for modifying an instructional program. Some others of 
the indicators for the probability outcome might be amalgamated into this sub-domain 
and remain instructionally meaningful, as well. It would be valuable for a state or other 
£q)propriate education unit to organize each of its content domains around instructionally 
valuable sub-domains and thus to be able to generate interpretable sub-scores from them. 

Extended Assessment Limits and Heuristics 

Identifying content assessment limits only gets us part of the way toward being able to 
communicate achievement targets to teachers and stakeholders in terms of what students 
need to know and be able to do. It is also necessary to describe the cognitive activities 
that constitute the behavioral part of each student outcome. We will describe a possible 
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way to include cognition in a test map and thus to meet the two criteria (domain 
understanding and assessment representation) we have described as necessary (though 
certainly not suflScient) for effective tests. 

One of the problems we fece is that there is no accepted codification of cognition. The 
Bloom, Englehart, Furst, Hill, &. Krathwohl (1956) taxonomy is best known among 
educators, but it has been reported to be difficult to use, does not correspond to valued 
cognition outcomes such as problem solving, and is not universally taught in teacher 
preparation. That it has been updated recently (Anderson Krathwohl, 2000) may help 
make it more useful, but at the same time contributes to a lack of consistency. Among 
those referenced by Nitko (2001), Marzano, Pickering & McTighe (1993) seems to be 
most useful for our purposes. They identify thirteen reasoning strategies that seem to be 
more consistent with publicly valued outcomes (e.g., one of them is problem solving). 

As Nitko (2001) points out, each of these outcomes has inplications for assessment. 

Another problem is that the descriptive language of cognition is not consistent across 
disciplines. Nationally each discipline has a unique perspective. If cognition were to be 
described equivalently across subject matter areas, then we think teachers would naturally 
become more focused on teaching cognition (and likely metacognition) in order to teach 
efficiently and to draw parallels among disciplines. However, the structure and language 
of our current do main specifications at the national level are not at all equivalent in their 
representation of cognition, in part because we lack a nationally accepted taxonomy.. It 
is likely naive to expect any state to define any content domain in a way that is 
dramatically different from its national parallel(s), so the obvious approach to 
representing cognition by using some codification of thinking and applying it to all 
content standard descriptions is likely doomed. 

Rather, we suggest application of an endorsed codification at the level of content 
assessment limi ts Our suggestion is to ask the same state-level content experts who 
agreed on the content assessment limits to recommend, for each limit, the elements in an 
accepted taxonomy for defining the cognition assessment limits that will be “fair game” 
to be assessed. Their deliberations should be conducted with the input of experts in 
cognitive processes. For example, say we used the Marzano et al. (1993) categories with 
the “finding probability” indicator’s content limits described above. Then we would take 
each limit and ask which cognitive categories will be assessed. For the first content limit, 
which is “a universe is the entire collection of outcomes that may occur,” it seems to us to 
make sense to ask students to 

1 . generalize a statement of a universe from a description of its elements (the 
taxonomy category name is induction), 

2. identify whether new elements are or are not members of a given universe (the 
taxonomy category name is deduction), 

3. correct a misstatement of universe (the taxonomy category name is error 
analysis), and 

4. explain why a given statement of a universe is adequate for a given purpose (the 
taxonomy category name is constructing support). 
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While the taxonomy was used to make sure all aspects of cognition were considered (at 
least according to this codification), it seems reasonable to select those activities that are 
considered most relevant to the content limit. We now have four statements that describe 
specific ways to use the definition of a universe that can guide both teachers and test 
developers. Note that the statements are at the appropriate level of generalization for 
what have elsewhere been called “heuristics” (Schafer, 2002). They are specific enough 
for making judgments about whether test prompts measure them, but are general enough 
so that there are virtually an unlimited number of such prompts that could be written. 
These criteria are borrowed fi'om Kerlinger (1990), who argued that an effective 
conceptual definition of a construct should be general enough to allow multiple 
operational definitions, but specific enough that the validity of any given operational 
definition will be apparent. We will borrow the term “heuristic” for each of these 
statements and apply criteria parallel to Kerlinger’s. 

How many such heuristics would there be for an outcome? In the Maryland eighth-grade 
mathematics standards for the probability outcome, there are five indicators (see above). 
We identified fourteen content assessment limits, but some of these represent other 
indicators, there is some degree of overlap among the indicators. It seems reasonable to 
assume that there are five unique content assessment limits for an average indicator. We 
foimd four cognition assessment limits apply to the first content assessment limit. Using 
that as a typical number, we estimate that a total of 5x5x4 = 100 heuristics might apply 
to a typical outcome. This seems manageable to us as a domain for an assessment, but if 
the teacher content experts feel it is too ambitious for instruction, then part of their task 
should be, by consensus, to pare the list down to its essentials. The intent is to describe 
the appropriate educational domain for instruction. The state-determined assessment, 
then, will appropriately represent the agreed-upon learning targets. 

For a complete domain description, the heuristics may be augmented fiirther with 
assessment exaiiq)les to help communicate their precise intent. For example, for the 
induction example, one could ask a student to display a universe of simple outcomes 
when two eight-sided dice are to be thrown. The items should correspond to the types of 
items that will be used on the state’s summative assessments. 



Summative and Formative Assessments 

A state’s summative assessments coidd be administered separately over the 
instructionally valuable sub-domains (e.g., outcomes) discussed earlier. Indeed, students 
migh t even pass (or fail) contents based upon scores earned over assessments of sub- 
domains (e.g., outcomes such as “probability”), which might be administered when 
students are deemed as ready for them by their teachers. Currently, all sub-domains are 
taken all at once, such as at the end of the year. 

Computers can play a crucial role in individualizing assessments in order to make on- 
demand assessment at the sub-domain level feasible and to control item overexposure. If 
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overall content scores are needed (e.g., to generate a proficiency level outcome for 
accountability purposes), the sub-domain scores could be aggregated up for that purpose. 
The on-demand, individualized summative sub-domain assessments could be 
administered by school testing coordinators at secure sites in schools. 

Companion fijrmative assessments should be available so that teachers have the resources 
to judge readiness in “real time,” according to each teacher’s schedule of instructional 
decisions that need to be made. The state is the appropriate agency to develop these since 
it is the “owner” of the test maps. But teachers could play a valuable role in developing 
and disse minating their own materials through state-administered channels. 

The model used in academic fields to review and disseminate scholarly works could work 
here. For example, a state might establish a process by which teacher-rdeveloped 
fijrmative assessments that correspond to test maps at the sub-domain level could be 
forwarded to a refereeing board. Developmental work could even be supported by the 
state through solicited or unsolicited grants to individual teachers or to teacher groups. 
The formative assessment submissions might require some data to document 
effectiveness, perhaps, for example, involving an independent tryout of the assessments 
in other classrooms. The board might review the submissions by a standard process, 
perhaps making recommendations to the author(s), and accept (or reject) the resulting 
documented fijrmative assessments fi)r dissemination throughout the state. Accepted 
assessments then could be made available throughout the state using a searchable 
database. The entire process could be accomplished in-house or through a vendor. 

Of course, there should be some incentives for teachers to produce acceptable formative 
assessments. In the academic world, the rewards are recognition (prestige), salary, and 
promotion. Some combination of similar rewards could be attached to successful 
productivity (through acceptance and dissemination) of formative assessments by 
teachers. The teacher-author’s school and district could appear along with his or her 
name(s) to provide institutional incentive fi>r productivity, much as is done in a 
university. An interesting by-product could be an increase in the professionalism of 
teaching and perhtq)s enhanced job satisfaction of successful tetichers. 

Conclusion 

If we are to see any substantive improvements in student achievement as a result of 
assessment and accountability, we must be able to have a significant impact in the 
classroom. After more than 10 years of assessment driven reform at the state level in 
Maryland, we believe perhaps even a majority of the state’s teachers do not yet 
understand what proficient student work looks like in the same way it is understood at the 
state level. We have noticed that virtually everyone who observes classrooms seems to 
come away with a similar conclusion. 

But fault does not lie with teachers. We at the state level have not established the link 
between content standards and day-to-day student performance. We are convinced that 
Maryland is not alone. Most if not virtually all other states have also failed “unpack” 
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their standards and indicators so that they are understandable as guides for classroom 
instruction. Rethinking the way we deliver summative and formative assessments is a 
logical extension of this argument, one that treats assessments as instructional tools. 

The critical first step in integrating meaningful assessment strategies into day-to-day 
instruction is full articulation of the content and cognition of content domains. We 
believe it would best serve its education community for a state to represent its augmented 
assessment limits, both the statements of content and their elaborations (heuristics), in a 
test map that will serve to explain to teachers what exactly is fair game for the test and 
that will serve to guide test developers as they create new editions, or forms, of the state’s 
assessments. The map should enumerate all the content and cognition limits, should 
describe what formats will be used to represent them, should specify how and with what 
fi-equency, or probability, they will be sampled, and should explain what sub-scores will 
be developed (and how they are developed) fi’om the student responses. Since no 
unreleased test items are needed, the map would not violate test security and could (we 
believe it should) be fi’eely available throughout the state. 

While this seems like a tall order, a state has the resources and, we argue, the 
responsibility to achieve a consensus among stakeholders around the content to be 
assessed. Whether or not they are articulated or even thought consciously about, 
decisions at the level of detail we describe must be made to build any test since every test 
item must ask a student to do something with something. Every test represents 
someone’s understandings about the domain and its limits. By developing and using an 
explicit map, a healthy consensus about what should be taught will be reached, teachers 
and other stakeholders will no longer be guessing about test content, test developers will 
produce assessments that represent appropriate domains, sub-scores will represent 
meaningful information for both students and programs, and there will be no surprises in 
the assessment system. A state-developed system of articulated summative and formative 
assessments, representing explicit test maps expressed in terms of heuristics, can foster 
the clarity of achievement targets necessary for teachers and other educators to develop 
more eflScient instructional activities and ultimately to effective and documented school 
reform. 
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Appendix 

“Understanding Assessment Limits” in Biology from the Maryland Web Site 

The Science Core Learning Goal (CLG) document is used for both assessment and 
instruction. The “assessment limits” included in the 1999 document were derived from 
the “at least” list that was included in the 1996 document. Assessment limits help clarify 
what a student will be asked to know, what a teacher will be asked to teach, and the 
content from which test questions will be drawn. The Maryland State Board of Education 
(MSBE) requires that all students have the opportunity to learn content about which they 
will be assessed. The clarification of content in the assessment limits supports this 
requirement. 

Assessment limits can be thought of in two ways: for instruction, they represent the 
minimum content that must be taught (the course must include at least the content 
outlined by the assessment limits); for assessment, they represent the maximum domain 
from which test questions will be developed (assessment limits identify the content which 
is fair game for the development of test items). All assessment items developed for the 
Hi gh School Assessments will be drawn from the assessment limits. However, not every 
assessment limit will be tested on every form of the test. 

There are five science Core Learning Goals: 

• Goal 1 : Skills and Processes 

• Goal 2: Concepts in Earth/Space Science 

• Goal 3: Concepts in Biology 

• Goal 4: Concepts in Chemistry 

• Goal 5: Concepts in Physics 

The skills and processes in Goal 1 are essential to science learning and will be assessed 
with each of the other four goals. In Goal 1, the indicators and the assessment limits are 
identical. Those marked “NT” will not be assessed on the biology test. However, they 
are still appropriate for instruction and other types of formative assessments. 

The assessment limits included in Goal 3 (concepts of Biology) are a subset of the 
concepts that should be covered in a biology course. Goals 2, 4, and 5 do not include 
“assessment limits,” per se. Since these content areas will not be assessed in Phase 1, they 
have not as yet been revised. Instead of assessment limits, these goals still contain an “at 
least” list. As Maryland develops assessments for Goals 2, 4, and 5, their “at least” 
designation will also be changed to assessment limits. 

An illustration of assessment limits follows. In the biology CLG, Expectation 3.3 deals 
with genetics. Indicator 1 states that, “The student will demonstrate that the sorting and 
recombination of genes during sexual reproduction has an effect on variation in 
offspring.” The two assessment limits which follow indicator 3.3.1 state: 
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• meiosis (chromosome mimber reduced by one-half; crossing-over may occur) 

• fertilization (combination of gametes). 

Therefore, test questions derived from biology indicator 3.3.1 may include questions 
about how fertilization is related to variation in sexually reproducing organisms, 
specifically, the role of meiosis in producing gametes, in reducing chromosome number, 
and the inheritance of new traits that result from crossing-over. Test questions may not 
include items dealing with the steps of meiosis, the identification of structures present in 
cells during meiosis, or the structure of the organs or organ systems where meiosis 
occurs. 

Vocabulary that is essential to imderstanding the concept being assessed may appear in 
an item, but vocabulary that relates to explicit details not essential to the imderstanding of 
an overall concept will not. For example, knowledge of trophic levels is critical to 
understanding food webs (3.5.4), but knowledge of Turner's Syndrome is not essential to 
understanding the effects of an abnormal number of chromosomes on an organism 
(3.3.4). 

Some critics may say that the use of assessment limits means teachers will be "teaching 
to the test." However, the phrase "teaching to the test" is misleading and a misnomer. 
Obviously, one can not teach to a test since the test questions are not known. What 
teachers really do is teach to a target, the local school system curriculum, and devise 
appropriate assessments (tests) to check how well the students have learned what they 
were taught. The extent of student learning is assessed through observations, classroom 
quizzes, homework, written assignments, formal teacher made tests, structured laboratory 
activities, etc. How else will teachers know if their students have learned? The local 
school system curriculum should be closely aligned with the CLG, and formative 
assessments should prepare students for the end-of-course assessment. 

Concern has been expressed that some teachers will adjust the curriculum to include only 
the content defined by the assessment limits. The “at least” portion of the original Core 
Learning Goals was designed to outline the non-negotiable content for a given course, not 
the entire course. Local principals, supervisors, and others must monitor instruction to 
insure that the curriculum being taught meets the requirements established by the local 
system. Reasonable requirements for coverage of the curriculum, pacing, grouping, and 
other instructional decisions are developed locally. 

The 1999 Core Learning Goal documents also differ from previous versions through 
adjustments to a limited number of indicators and the removal of sample classroom 
learning activities. No changes were made in the goal or expectation statements, 
however, the language of certain indicators was modified if it was shown to be 
ambiguous or contained multiple actions for instruction and/or assessment. In cases of the 
latter, the actions were split between separate indicators. For example, an indicator that 
stated that students will analyze and evaluate was divided into separate indicators for 
each verb. 
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In conclusion, the CLG document represents the “core” content for both instruction and 
assessment. Local school systems should use it appropriately when making decisions 
about curriculum, instruction and assessment. 
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